-
Notifications
You must be signed in to change notification settings - Fork 21
feat: enhance subscription management with metrics and reconnection #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
magicblock-chainlink/src/remote_account_provider/mod.rs (1)
734-758: Consider reinserting evicted entry on rollback for complete atomicity.The rollback logic at lines 742 and 753 removes the newly added
pubkeyfrom the LRU cache when eviction unsubscribe or new subscription fails. However, theevictedentry (which was successfully subscribed before eviction) is not reinserted into the cache.This creates a state where:
- LRU cache: neither
evictednorpubkeyis present- Pubsub client: still subscribed to
evicted(and potentiallypubkey)A past review comment suggested reinserting the evicted entry to fully restore the prior state. After removing
pubkeyat line 742, capacity is available, so re-addingevictedwon't cause another eviction:if let Err(err) = self.pubsub_client.unsubscribe(evicted).await { warn!("Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}"); self.lrucache_subscribed_accounts.remove(pubkey); self.lrucache_subscribed_accounts.add(evicted); // Restore prior state return Err(err); }Similarly at line 753 after subscribe failure. This ensures the cache matches the pubsub subscriptions even when partial operations fail.
Based on learnings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
magicblock-chainlink/src/remote_account_provider/mod.rs(28 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
Applied to files:
magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Applied to files:
magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-10-26T16:53:29.820Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.
Applied to files:
magicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
Applied to files:
magicblock-chainlink/src/remote_account_provider/mod.rs
🧬 Code graph analysis (1)
magicblock-chainlink/src/remote_account_provider/mod.rs (5)
magicblock-metrics/src/metrics/mod.rs (5)
inc_account_fetches_failed(399-401)inc_account_fetches_found(403-405)inc_account_fetches_not_found(407-409)inc_account_fetches_success(395-397)set_monitored_accounts_count(357-359)magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs (3)
new(46-54)new(311-322)try_new_from_url(165-180)magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4)
new(25-34)pubkeys(38-41)pubkeys(129-135)len(117-123)magicblock-chainlink/src/submux/mod.rs (1)
new(147-158)magicblock-chainlink/src/remote_account_provider/config.rs (2)
default(65-72)try_new_with_metrics(27-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run_make_ci_lint
- GitHub Check: run_make_ci_test
- GitHub Check: Build Project
🔇 Additional comments (10)
magicblock-chainlink/src/remote_account_provider/mod.rs (10)
1-8: LGTM: Import additions support new features.The new imports (
hash_map::Entry,task,time, and metrics functions) are necessary for the Entry API usage, background metrics updater, and metrics integration.Also applies to: 35-36, 49-53
200-270: LGTM: Background metrics updater implementation is sound.The periodic task correctly compares LRU cache and pubsub client subscription counts, logs discrepancies with detailed debugging info, and updates the monitored accounts gauge. The conditional debug logging is a good optimization to avoid unnecessary work in production.
283-312: LGTM: Constructor properly initializes LRU cache and optional metrics updater.The initialization correctly handles the NonZeroUsize capacity constraint and conditionally starts the background metrics updater based on configuration. The SAFETY comment appropriately documents the unwrap.
634-641: LGTM: Entry API correctly appends to existing waiters.The use of
Entry::Occupiedto append senders andEntry::Vacantto insert new entries ensures that concurrent callers waiting for the same account will all receive the fetch result, fixing the previous issue where existing waiters were dropped.
801-824: LGTM: Unsubscribe maintains consistency between LRU cache and pubsub client.The function correctly checks LRU cache state before attempting pubsub unsubscription, and only removes from the cache after a successful pubsub unsubscribe. If pubsub unsubscribe fails, the entry remains in the LRU cache to maintain consistency with the actual subscription state.
849-866: LGTM: Error notification helper is well-structured.The
notify_errorhelper correctly removes all pending requests for the failed accounts, increments the failure metric, and sends error results to all waiters. This ensures no requests are left hanging when a fetch fails.
874-886: LGTM: Retry macro implementation is correct.The retry macro properly decrements the counter, checks max retries, invokes the error notification helper on exhaustion, and uses
continueto loop back for another attempt with a 400ms delay. The logic is sound.
937-937: LGTM: Helius compatibility for context slot errors.Good catch to handle both the standard
JSON_RPC_SERVER_ERROR_MIN_CONTEXT_SLOT_NOT_REACHEDand the Helius-specificHELIUS_CONTEXT_SLOT_NOT_REACHEDerror codes. This ensures compatibility across different RPC providers.
983-1004: Verify metrics semantics for "mark empty if not found" accounts.At lines 991-1003, accounts that are not found but are in the
mark_empty_if_not_foundlist are counted infound_count(line 992) rather thannot_found_count. This means theaccount_fetches_foundmetric includes accounts that were fabricated as empty rather than actually found on-chain.This might be intentional to reflect that the caller received a usable account data structure, but it could be misleading when analyzing metrics to understand actual on-chain state. Consider whether these should increment a separate counter (e.g.,
account_fetches_marked_empty) or be clarified in documentation.
976-1015: LGTM: Metrics tracking for fetch results.The metrics correctly track found vs. not-found accounts and increment the success counter for the entire batch. The counts are updated based on whether each account was present in the RPC response.
| // When an account is not present for the desired | ||
| // min-context slot then we normally get the below | ||
| // handled `RpcResponseError`, but may also get the | ||
| // following error from the RPC. | ||
| // See test::ixtest_existing_account_for_future_slot | ||
| // ``` | ||
| // RpcError( | ||
| // ForUser( | ||
| // "AccountNotFound: \ | ||
| // pubkey=DaeruQ4SukTQaJA5muyv51MQZok7oaCAF8fAW19mbJv5: \ | ||
| // pubkey=DaeruQ4SukTQaJA5muyv51MQZok7oaCAF8fAW19mbJv5: \ | ||
| // RPC response error -32016: \ | ||
| // Minimum context slot has not been reached; ", | ||
| // ), | ||
| // ) | ||
| // ``` | ||
| retry!("Fetching accounts failed: {rpc_user_err:?}"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider narrowing the retry scope for ForUser errors.
The code retries on all RpcError::ForUser errors, but the comment at lines 915-919 specifically mentions the "AccountNotFound: ... Minimum context slot has not been reached" error as the expected case. Retrying on all ForUser errors might lead to unnecessary retries for unrelated errors (e.g., authentication failures, rate limits, or other user-facing errors).
Consider checking if the error message contains "Minimum context slot" or "AccountNotFound" before retrying, or at least log which type of ForUser error triggered the retry.
🤖 Prompt for AI Agents
In magicblock-chainlink/src/remote_account_provider/mod.rs around lines 915 to
931, the code currently retries on any RpcError::ForUser but should only retry
for the specific "AccountNotFound" / "Minimum context slot" case described in
the comment; change the retry logic to inspect the ForUser error string (or enum
variant if available) and only call retry when the message contains "Minimum
context slot" or "AccountNotFound", otherwise propagate or log the error (and
include the exact error when logging) so unrelated user-facing errors do not
trigger retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
magicblock-metrics/src/metrics/mod.rs(9 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Applied to files:
magicblock-metrics/src/metrics/mod.rs
🧬 Code graph analysis (1)
magicblock-metrics/src/metrics/mod.rs (2)
magicblock-chainlink/src/remote_account_provider/mod.rs (1)
new(275-330)magicblock-aperture/src/state/subscriptions.rs (4)
new(342-355)new(410-413)count(335-337)count(370-372)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run_make_ci_test
- GitHub Check: Build Project
- GitHub Check: run_make_ci_lint
🔇 Additional comments (4)
magicblock-metrics/src/metrics/mod.rs (4)
281-284: LGTM!The new account fetch metrics are properly registered following the established pattern.
395-409: LGTM!The increment helper functions are well-designed:
- Consistent naming with their corresponding metrics
- Appropriate use of
inc_by(count)for batch increments- Clean and straightforward implementation
357-359: All call sites verified as updated. The function rename is complete—no remaining references to the old function name, and the import and call sites inmagicblock-chainlinkhave been correctly updated to useset_monitored_accounts_count.
241-287: Removal of SUBSCRIPTIONS_COUNT_GAUGE is safe and complete.Verification confirms that
SUBSCRIPTIONS_COUNT_GAUGEhas been fully removed with no orphaned references. The replacement metricRPC_WS_SUBSCRIPTIONS_COUNTis properly defined, registered, and actively used throughout the codebase (imported and utilized inmagicblock-aperture/src/state/subscriptions.rs). No breaking changes detected.
| // Account fetch results from network (RPC) | ||
| pub static ref ACCOUNT_FETCHES_SUCCESS_COUNT: IntCounter = | ||
| IntCounter::new( | ||
| "account_fetches_success_count", | ||
| "Total number of successful network \ | ||
| account fetches", | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| pub static ref ACCOUNT_FETCHES_FAILED_COUNT: IntCounter = | ||
| IntCounter::new( | ||
| "account_fetches_failed_count", | ||
| "Total number of failed network account fetches \ | ||
| (RPC errors)", | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| pub static ref ACCOUNT_FETCHES_FOUND_COUNT: IntCounter = | ||
| IntCounter::new( | ||
| "account_fetches_found_count", | ||
| "Total number of network account fetches that \ | ||
| found an account", | ||
| ) | ||
| .unwrap(); | ||
|
|
||
| pub static ref ACCOUNT_FETCHES_NOT_FOUND_COUNT: IntCounter = | ||
| IntCounter::new( | ||
| "account_fetches_not_found_count", | ||
| "Total number of network account fetches where \ | ||
| account was not found", | ||
| ) | ||
| .unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Clarify the semantic relationship between SUCCESS/FAILED and FOUND/NOT_FOUND metrics.
The four metrics track two orthogonal dimensions:
SUCCESS_COUNT/FAILED_COUNT: Whether the RPC call itself succeeded or failedFOUND_COUNT/NOT_FOUND_COUNT: Whether the account exists on-chain
This means a successful RPC call that returns "account does not exist" would increment both SUCCESS_COUNT and NOT_FOUND_COUNT. While this design is reasonable, the semantic overlap could confuse metric consumers.
Consider adding a comment block above these metric definitions explaining:
- The two orthogonal dimensions being tracked
- Example scenarios (e.g., "A successful RPC call that finds no account increments both SUCCESS and FOUND")
- The relationship:
SUCCESS = FOUND + NOT_FOUND(approximately, barring RPC errors)
// Account fetch results from network (RPC)
// These metrics track two dimensions:
// 1. RPC call outcome: SUCCESS (call succeeded) vs FAILED (RPC error)
// 2. Account existence: FOUND (account exists) vs NOT_FOUND (account doesn't exist)
// Note: A successful RPC call increments SUCCESS and either FOUND or NOT_FOUND.
// Therefore: SUCCESS ≈ FOUND + NOT_FOUND (modulo edge cases)
pub static ref ACCOUNT_FETCHES_SUCCESS_COUNT: IntCounter =🤖 Prompt for AI Agents
magicblock-metrics/src/metrics/mod.rs around lines 164 to 195: add a clarifying
comment block immediately above the four ACCOUNT_FETCHES_* metric definitions
that explains these metrics measure two orthogonal dimensions (RPC outcome:
SUCCESS vs FAILED, and account existence: FOUND vs NOT_FOUND), give one or two
short examples (e.g., a successful RPC that finds no account increments SUCCESS
and NOT_FOUND), and state the relationship (SUCCESS ≈ FOUND + NOT_FOUND modulo
RPC errors); place this concise explanatory text in plain English so metric
consumers understand the overlap and how to interpret the counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
magicblock-chainlink/src/chainlink/blacklisted_accounts.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run_make_ci_test
- GitHub Check: run_make_ci_lint
- GitHub Check: Build Project
🔇 Additional comments (1)
magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)
69-69: Verify the intent and scope of blacklisting the SPL Token program.The implementation correctly adds the SPL Token program to the blacklist. However, please confirm:
- This blacklists only the Token program account itself (TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA), not individual token accounts created by the program.
- The Token-2022/Token Extensions program (TokenzQdBNbLqP5VEhdkAS6EPFLC1PHnBqCXEpPxuEb) remains unblacklisted—is this intentional?
- What subscription management issue prompted this addition?
| const NATIVE_TOKEN_PROGRAM_ID: Pubkey = | ||
| solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider adding a comment explaining the blacklisting rationale.
Similar to NATIVE_SOL_ID (lines 10-11), consider adding a comment explaining why the SPL Token program account is being blacklisted. This helps future maintainers understand the intent.
Example:
+ // The SPL Token program account should not be monitored directly
+ // (individual token accounts are fine, but the program account itself is blacklisted)
const NATIVE_TOKEN_PROGRAM_ID: Pubkey =
solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const NATIVE_TOKEN_PROGRAM_ID: Pubkey = | |
| solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"); | |
| // The SPL Token program account should not be monitored directly | |
| // (individual token accounts are fine, but the program account itself is blacklisted) | |
| const NATIVE_TOKEN_PROGRAM_ID: Pubkey = | |
| solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"); |
🤖 Prompt for AI Agents
In magicblock-chainlink/src/chainlink/blacklisted_accounts.rs around lines 51 to
52, add a concise comment above NATIVE_TOKEN_PROGRAM_ID explaining why the SPL
Token program account is blacklisted (for example: to prevent
token-program-driven account creation/interaction or to avoid treating the token
program as a user-owned account in checks), mirroring the style and level of
detail used for NATIVE_SOL_ID on lines 10–11; keep the comment short,
action-oriented, and focused on intent for future maintainers.
* master: fix: fetching CUs for metrics (#623)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
magicblock-api/src/tickers.rs(1 hunks)magicblock-metrics/src/service.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
Applied to files:
magicblock-metrics/src/service.rs
🧬 Code graph analysis (1)
magicblock-api/src/tickers.rs (2)
magicblock-api/src/magic_validator.rs (1)
ledger(735-737)magicblock-metrics/src/metrics/mod.rs (1)
set_ledger_size(297-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Project
- GitHub Check: run_make_ci_test
- GitHub Check: run_make_ci_lint
🔇 Additional comments (1)
magicblock-api/src/tickers.rs (1)
111-130: LGTM! Clean implementation with proper cancellation handling.The simplified implementation correctly:
- Handles ledger storage size retrieval errors with appropriate logging
- Uses
tokio::select!for graceful cancellation- Clones the
Arc<Ledger>before moving into the async block
magicblock-api/src/tickers.rs
Outdated
| #[allow(unused_variables)] | ||
| pub fn init_system_metrics_ticker( | ||
| tick_duration: Duration, | ||
| ledger: &Arc<Ledger>, | ||
| accountsdb: &Arc<AccountsDb>, | ||
| token: CancellationToken, | ||
| ) -> tokio::task::JoinHandle<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify the unused accountsdb parameter is intentional.
The accountsdb parameter is no longer used in the simplified implementation, yet it remains in the function signature. While the #[allow(unused_variables)] attribute acknowledges this, keeping unused parameters can mislead callers about the function's behavior.
Please confirm:
- Is this parameter kept for API compatibility during a transition period?
- Should it be removed in a future breaking change?
- Should the function documentation explicitly note that
accountsdbis currently unused but kept for compatibility?
To verify the impact of removing this parameter, run:
🏁 Script executed:
#!/bin/bash
# Search for all call sites of init_system_metrics_ticker
rg -n -A5 -B2 'init_system_metrics_ticker\s*\(' --type rustLength of output: 1143
---
🏁 Script executed:
```shell
#!/bin/bash
# Get the complete function implementation to see if accountsdb is used
sed -n '104,130p' magicblock-api/src/tickers.rs
Length of output: 838
🏁 Script executed:
#!/bin/bash
# Search comprehensively for all references to init_system_metrics_ticker (function calls only, not definition)
rg -n 'init_system_metrics_ticker' --type rust | grep -v '^\s*pub fn init_system_metrics_ticker'Length of output: 353
🏁 Script executed:
#!/bin/bash
# Check the call site in magic_validator.rs to see the full context
sed -n '220,235p' magicblock-api/src/magic_validator.rsLength of output: 466
Remove the unused accountsdb parameter from function signature and call site.
The accountsdb parameter is declared but never used in the function implementation. The #[allow(unused_variables)] attribute is suppressing a compiler warning that indicates incomplete refactoring.
Required fixes:
- Remove
accountsdb: &Arc<AccountsDb>,from function signature atmagicblock-api/src/tickers.rs:108 - Remove
&accountsdb,from the function call atmagicblock-api/src/magic_validator.rs:227
🤖 Prompt for AI Agents
In magicblock-api/src/tickers.rs around lines 104–110, the function
init_system_metrics_ticker declares an unused parameter accountsdb and silences
the warning with #[allow(unused_variables)]; remove the accountsdb:
&Arc<AccountsDb>, parameter from the function signature and delete the
#[allow(unused_variables)] attribute, then update all call sites (specifically
remove the &accountsdb, argument at magicblock-api/src/magic_validator.rs:227)
so the function is invoked with the remaining parameters; also remove any
now-unused AccountsDb imports or bindings to keep the code clean.
magicblock-api/src/tickers.rs
Outdated
| /* | ||
| #[allow(unused_variables)] | ||
| pub fn init_system_metrics_ticker_old( | ||
| tick_duration: Duration, | ||
| ledger: &Arc<Ledger>, | ||
| accountsdb: &Arc<AccountsDb>, | ||
| token: CancellationToken, | ||
| ) -> tokio::task::JoinHandle<()> { | ||
| fn try_set_ledger_counts(ledger: &Ledger) { | ||
| macro_rules! try_set_ledger_count { | ||
| ($name:ident) => { | ||
| paste::paste! { | ||
| match ledger.[< count_ $name >]() { | ||
| Ok(count) => { | ||
| metrics::[< set_ledger_ $name _count >](count); | ||
| } | ||
| Err(err) => warn!( | ||
| "Failed to get ledger {} count: {:?}", | ||
| stringify!($name), | ||
| err | ||
| ), | ||
| } | ||
| } | ||
| }; | ||
| } | ||
| try_set_ledger_count!(block_times); | ||
| try_set_ledger_count!(blockhashes); | ||
| try_set_ledger_count!(slot_signatures); | ||
| try_set_ledger_count!(address_signatures); | ||
| try_set_ledger_count!(transaction_status); | ||
| try_set_ledger_count!(transaction_successful_status); | ||
| try_set_ledger_count!(transaction_failed_status); | ||
| try_set_ledger_count!(transactions); | ||
| try_set_ledger_count!(transaction_memos); | ||
| try_set_ledger_count!(perf_samples); | ||
| try_set_ledger_count!(account_mod_data); | ||
| } | ||
| fn try_set_ledger_storage_size(ledger: &Ledger) { | ||
| match ledger.storage_size() { | ||
| Ok(byte_size) => metrics::set_ledger_size(byte_size), | ||
| Err(err) => warn!("Failed to get ledger storage size: {:?}", err), | ||
| } | ||
| } | ||
| fn set_accounts_storage_size(bank: &Bank) { | ||
| let byte_size = bank.accounts_db_storage_size(); | ||
| metrics::set_accounts_size(byte_size); | ||
| } | ||
| fn set_accounts_count(bank: &Bank) { | ||
| metrics::set_accounts_count(bank.accounts_db.get_accounts_count()); | ||
| } | ||
| let ledger = ledger.clone(); | ||
| let bank = bank.clone(); | ||
| tokio::task::spawn(async move { | ||
| loop { | ||
| tokio::select! { | ||
| _ = tokio::time::sleep(tick_duration) => { | ||
| try_set_ledger_storage_size(&ledger); | ||
| set_accounts_storage_size(&bank); | ||
| try_set_ledger_counts(&ledger); | ||
| set_accounts_count(&bank); | ||
| }, | ||
| _ = token.cancelled() => { | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| }) | ||
| tokio::task::spawn(async move {}) | ||
| } | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider removing the commented-out old implementation.
The 72-line commented block preserves the previous multi-metric implementation. While this may serve as temporary reference during the transition, large commented sections can clutter the codebase and create maintenance overhead.
Recommendations:
- If this is a temporary reference during a transition period, add a
TODOcomment with a timeline for removal (e.g., "TODO: Remove after v2.0 release"). - If the old implementation may be needed for rollback, consider preserving it in a separate branch or commit rather than in comments.
- If permanently removed, delete the commented code—Git history will preserve it if needed for future reference.
🤖 Prompt for AI Agents
In magicblock-api/src/tickers.rs around lines 132 to 204, there is a 72-line
commented-out old implementation (init_system_metrics_ticker_old) cluttering the
file; remove the large commented block or replace it with a short TODO note. If
you need a temporary reference during the transition, replace the commented code
with a single-line TODO including a removal timeline (e.g., "TODO: remove after
vX.Y") or move the old implementation to a separate commit/branch, otherwise
delete the commented block entirely so Git history retains the prior code.
| // We must consume the body fully to keep the connection alive. We | ||
| // iterate over all chunks and simply drop them. This prevents garbage | ||
| // data of previous requests from being stuck in connection buffer. | ||
| let mut body = req.into_body(); | ||
| while (body.frame().await).is_some() {} | ||
|
|
||
| result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Add timeout and error handling to body draining loop.
The body draining logic has the following concerns:
-
DoS vulnerability: The loop has no timeout, allowing a malicious or slow client to tie up the server indefinitely by sending a large body slowly. Consider adding a timeout (e.g., using
tokio::time::timeout) or a maximum frame count limit. -
Silent error handling: Errors from
body.frame().awaitare silently ignored. At minimum, log errors for debugging purposes. -
Performance: The body is drained for every request, including GET requests that typically have no body. While the overhead is minimal for empty bodies, consider checking
req.headers().get("content-length")first to skip draining when there's no body.
Apply this diff to add timeout and error logging:
- // We must consume the body fully to keep the connection alive. We
- // iterate over all chunks and simply drop them. This prevents garbage
- // data of previous requests from being stuck in connection buffer.
- let mut body = req.into_body();
- while (body.frame().await).is_some() {}
+ // We must consume the body fully to keep the connection alive. We
+ // iterate over all chunks and simply drop them. This prevents garbage
+ // data of previous requests from being stuck in connection buffer.
+ let mut body = req.into_body();
+ let drain_timeout = tokio::time::Duration::from_secs(30);
+ match tokio::time::timeout(drain_timeout, async {
+ while let Some(frame_result) = body.frame().await {
+ if let Err(e) = frame_result {
+ warn!("Error draining request body: {:?}", e);
+ }
+ }
+ })
+ .await
+ {
+ Err(_) => warn!("Timeout draining request body"),
+ Ok(_) => {}
+ }Additionally, verify whether this body draining is necessary with the current version of Hyper. Modern versions typically handle request body consumption automatically for keep-alive connections:
🌐 Web query:
Does Hyper automatically consume request bodies for HTTP/1.1 keep-alive connections?
💡 Result:
Short answer: No. Hyper does not implicitly drain/consume an unread request body for you — if you want the TCP connection to be reused for HTTP/1.1 keep‑alive you must read (or explicitly discard) the request body before returning a response, or configure your server/connection behavior accordingly. See hyper's docs on request bodies and collecting them, and the HTTP/1.1 messaging guidance that servers should read or discard a body if they intend to keep a connection open. [1][2][3]
Sources:
- hyper guide: request/response bodies and collecting the request body. [1]
- hyper server/conn::Http options (http1_keep_alive behavior). [2]
- HTTP messaging spec: servers SHOULD read/discard request bodies if they will keep the transport open (HTTP/1.1 guidance). [3]
Add timeout and error handling to body draining loop to prevent DoS and improve observability.
The body draining logic correctly implements HTTP/1.1 keep-alive support, as Hyper requires explicit body consumption before reusing the connection. However, the implementation needs hardening:
-
DoS vulnerability: The loop has no timeout, allowing a malicious or slow client to tie up the server indefinitely by sending a large body slowly. Add a timeout (e.g.,
tokio::time::timeout) or a maximum frame count limit. -
Silent error handling: Errors from
body.frame().awaitare silently ignored, making debugging difficult. At minimum, log errors. -
Performance optimization (optional): Skip body draining for requests without a body by checking
req.headers().get("content-length")first.
Apply this diff to add timeout and error logging:
- // We must consume the body fully to keep the connection alive. We
- // iterate over all chunks and simply drop them. This prevents garbage
- // data of previous requests from being stuck in connection buffer.
- let mut body = req.into_body();
- while (body.frame().await).is_some() {}
+ // We must consume the body fully to keep the connection alive. We
+ // iterate over all chunks and simply drop them. This prevents garbage
+ // data of previous requests from being stuck in connection buffer.
+ let mut body = req.into_body();
+ let drain_timeout = tokio::time::Duration::from_secs(30);
+ match tokio::time::timeout(drain_timeout, async {
+ while let Some(frame_result) = body.frame().await {
+ if let Err(e) = frame_result {
+ warn!("Error draining request body: {:?}", e);
+ }
+ }
+ })
+ .await
+ {
+ Err(_) => warn!("Timeout draining request body"),
+ Ok(_) => {}
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In magicblock-metrics/src/service.rs around lines 131 to 137, the body-draining
loop needs hardening: wrap the await in a tokio::time::timeout (e.g., a few
seconds) to avoid slow-client DoS, track a maximum frame count (e.g., a
reasonable upper bound) to prevent infinite loops, and check whether the request
actually has a body first (inspect Content-Length or absence of
transfer-encoding) to skip draining when unnecessary. On timeout or any Err from
body.frame().await, log the error with context and break the loop; on Ok(None)
stop normally. Ensure the timeout and frame count are configurable constants and
that logging includes request id/context for observability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
test-integration/test-schedule-intent/tests/test_schedule_intents.rs (1)
128-141: Add verification after second undelegation at line 140.The test verifies undelegation after the first
schedule_intent(line 128), but the secondschedule_intentat line 132 withSome(vec![102])also triggers undelegation since anySome(...)value incounter_diffssetsis_undelegate = true. For consistency and completeness, addverify_undelegation_in_ephem_via_owner(&[payer.pubkey()], &ctx);after the assertion block ends at line 140.magicblock-chainlink/src/remote_account_provider/mod.rs (2)
911-970: Retry scope for ForUser errors could be narrower.The code retries on all
RpcError::ForUsererrors (line 930), but the comment (lines 915-919) suggests only the "AccountNotFound: Minimum context slot has not been reached" case requires retries. Other ForUser errors (authentication failures, rate limits, etc.) won't resolve with retries and waste retry budget.This is acceptable for a background task with limited retries, but consider narrowing the retry condition:
RpcError::ForUser(ref rpc_user_err) => { - // When an account is not present for the desired - // min-context slot then we normally get the below - // handled `RpcResponseError`, but may also get the - // following error from the RPC. - // See test::ixtest_existing_account_for_future_slot - // ``` - // RpcError( - // ForUser( - // "AccountNotFound: \ - // pubkey=DaeruQ4SukTQaJA5muyv51MQZok7oaCAF8fAW19mbJv5: \ - // RPC response error -32016: \ - // Minimum context slot has not been reached; ", - // ), - // ) - // ``` - retry!("Fetching accounts failed: {rpc_user_err:?}"); + // Retry only for context slot issues + if rpc_user_err.contains("Minimum context slot") + || rpc_user_err.contains("AccountNotFound") { + retry!("Fetching accounts failed: {rpc_user_err:?}"); + } else { + let err_msg = format!( + "RpcError fetching accounts {}: ForUser({rpc_user_err})", + pubkeys_str(&pubkeys) + ); + notify_error(&err_msg); + return; + } }
728-758: Subscription leak on eviction failure—reinsertion of evicted account required for rollback consistency.When
pubsub_client.unsubscribe(evicted)fails at line 738, the code correctly removespubkeyfrom the LRU cache to prevent subscription of the new account. However,evictedwas already removed from the LRU byadd(*pubkey)on line 734 and is not reinserted during rollback.This creates a mismatch:
evictedremains subscribed in the pubsub client but is no longer tracked in the LRU cache, causing a subscription leak. The diagnostic code at lines 223–254 will detect and warn about this inconsistency.Restore the pre-add state by reinserting
evicted:if let Err(err) = self.pubsub_client.unsubscribe(evicted).await { warn!( "Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}"); // Rollback the LRU add since eviction failed self.lrucache_subscribed_accounts.remove(pubkey); + // Restore evicted to maintain consistency between LRU and pubsub + self.lrucache_subscribed_accounts.add(evicted); return Err(err); }Since
pubkeywas just removed, capacity is available and reinsertingevictedwill not trigger another eviction.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
magicblock-chainlink/src/chainlink/mod.rs(5 hunks)magicblock-chainlink/src/remote_account_provider/mod.rs(28 hunks)test-integration/test-schedule-intent/tests/test_schedule_intents.rs(5 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
Applied to files:
magicblock-chainlink/src/chainlink/mod.rsmagicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Applied to files:
magicblock-chainlink/src/chainlink/mod.rsmagicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
Applied to files:
magicblock-chainlink/src/chainlink/mod.rsmagicblock-chainlink/src/remote_account_provider/mod.rs
📚 Learning: 2025-10-14T09:56:14.047Z
Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.
Applied to files:
test-integration/test-schedule-intent/tests/test_schedule_intents.rs
📚 Learning: 2025-11-07T13:09:52.253Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: test-kit/src/lib.rs:275-0
Timestamp: 2025-11-07T13:09:52.253Z
Learning: In test-kit, the transaction scheduler in ExecutionTestEnv is not expected to shut down during tests. Therefore, using `.unwrap()` in test helper methods like `schedule_transaction` is acceptable and will not cause issues in the test environment.
Applied to files:
test-integration/test-schedule-intent/tests/test_schedule_intents.rs
📚 Learning: 2025-10-26T16:53:29.820Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.
Applied to files:
magicblock-chainlink/src/remote_account_provider/mod.rs
🧬 Code graph analysis (3)
magicblock-chainlink/src/chainlink/mod.rs (2)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)
new(134-158)magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)
blacklisted_accounts(6-30)
test-integration/test-schedule-intent/tests/test_schedule_intents.rs (1)
test-integration/programs/flexi-counter/src/state.rs (1)
pda(32-35)
magicblock-chainlink/src/remote_account_provider/mod.rs (6)
magicblock-metrics/src/metrics/mod.rs (5)
inc_account_fetches_failed(399-401)inc_account_fetches_found(403-405)inc_account_fetches_not_found(407-409)inc_account_fetches_success(395-397)set_monitored_accounts_count(357-359)magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs (1)
new(126-158)magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs (3)
new(46-54)new(311-322)try_new_from_url(165-180)magicblock-chainlink/src/submux/mod.rs (1)
new(147-158)magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4)
new(25-34)pubkeys(38-41)pubkeys(129-135)len(117-123)magicblock-chainlink/src/remote_account_provider/config.rs (2)
default(65-72)try_new_with_metrics(27-42)
🔇 Additional comments (16)
test-integration/test-schedule-intent/tests/test_schedule_intents.rs (3)
66-69: LGTM! Verification helper appropriately placed.The undelegation verification is correctly positioned after the schedule_intent operation. Minor observation: line 247 includes a debug log after verification, but this call site doesn't—consider adding one for consistency with other tests.
193-196: LGTM! Multi-account verification correctly implemented.The verification helper is properly invoked with multiple pubkeys, correctly mapping from the payers vector. This ensures all undelegated accounts are verified in the ephemeral layer.
243-247: LGTM! Good example of verification with logging.This call site demonstrates the complete pattern with a debug log after verification. This pattern could be applied to other call sites (lines 69, 128) for consistency.
magicblock-chainlink/src/chainlink/mod.rs (4)
1-4: LGTM!The atomic imports are appropriate for the new counter-based account removal tracking in
reset_accounts_bank.
139-191: LGTM!The refactored account removal logic with atomic counters provides clear telemetry. The corrected
non_emptycalculation usingsaturating_sub(from previous fixes) properly computes non-empty accounts without the fetch_sub pitfall.
317-338: LGTM!The enhanced logging correctly gates on
Tracelevel and uses thetrace!()macro consistently. Themark_empty_if_not_founddetails provide valuable debugging context for account fetching.
361-377: LGTM!Upgrading undelegation logging from
trace!()todebug!()is appropriate for these important operational events, making them visible without requiring trace-level verbosity.magicblock-chainlink/src/remote_account_provider/mod.rs (9)
1-74: LGTM!The new imports support the enhanced functionality: Entry API for atomic fetch operations, metrics integration, and background task management. The
FetchResulttype alias improves code clarity, and the Helius-specific error code constant documents a provider-specific variation.
75-109: LGTM!The field rename to
lrucache_subscribed_accountsis more descriptive, and the_active_subscriptions_task_handleproperly maintains the background task lifetime through Drop semantics.
272-330: LGTM!The constructor properly initializes the LRU cache and conditionally starts the metrics updater based on configuration. The
NonZeroUsizeconstruction is safe due to the config validation invariant documented in the comment.
332-377: LGTM!The per-endpoint abort channels enable independent lifecycle management for each pubsub client, supporting the event-driven reconnection strategy mentioned in the PR objectives.
610-706: LGTM!The Entry API usage (lines 634-643) correctly appends to existing waiter lists instead of replacing them, fixing the concurrent fetch issue from previous reviews. The nested
Resulthandling forFetchResultis appropriate for the channel-based architecture.
708-725: LGTM!The subscription setup cleanly delegates to the
subscribemethod for each pubkey, which properly handles deduplication viais_watchingchecks.
797-825: LGTM!The
unsubscribelogic correctly maintains consistency between the LRU cache and pubsub client by only removing from the LRU after successful pubsub unsubscription (lines 809-813). The early return for non-tracked pubkeys (lines 801-807) prevents spurious errors.
849-886: LGTM!The
notify_errorhelper andretry!macro cleanly encapsulate error handling and retry logic, properly integrating metrics and notifying all waiting requests on failure.
976-1055: LGTM!The metrics integration properly tracks found/not-found counts and updates gauges after successful fetches. The handling of
mark_empty_if_not_foundaccounts (lines 991-1004) correctly distinguishes between truly missing accounts and those explicitly marked as empty.
| /// Creates a background task that periodically updates the active subscriptions gauge | ||
| fn start_active_subscriptions_updater<PubsubClient: ChainPubsubClient>( | ||
| subscribed_accounts: Arc<AccountsLruCache>, | ||
| pubsub_client: Arc<PubsubClient>, | ||
| ) -> task::JoinHandle<()> { | ||
| task::spawn(async move { | ||
| let mut interval = time::interval(Duration::from_millis( | ||
| ACTIVE_SUBSCRIPTIONS_UPDATE_INTERVAL_MS, | ||
| )); | ||
| let never_evicted = subscribed_accounts.never_evicted_accounts(); | ||
|
|
||
| loop { | ||
| interval.tick().await; | ||
| let lru_count = subscribed_accounts.len(); | ||
| let (pubsub_total, pubsub_without_never_evict) = pubsub_client | ||
| .subscription_count(Some(&never_evicted)) | ||
| .await; | ||
|
|
||
| let all_pubsub_subs = if log::log_enabled!(log::Level::Debug) { | ||
| pubsub_client.subscriptions() | ||
| } else { | ||
| vec![] | ||
| }; | ||
| if lru_count != pubsub_without_never_evict { | ||
| warn!( | ||
| "User account subscription counts LRU cache={} pubsub client={} don't match", | ||
| lru_count, pubsub_without_never_evict | ||
| ); | ||
| if log::log_enabled!(log::Level::Debug) { | ||
| // Log all pubsub subscriptions for debugging | ||
| trace!( | ||
| "All pubsub subscriptions: {:?}", | ||
| all_pubsub_subs | ||
| ); | ||
|
|
||
| // Find extra keys in pubsub that are not in LRU cache | ||
| let lru_pubkeys = subscribed_accounts.pubkeys(); | ||
| let pubsub_subs_without_never_evict: HashSet<_> = | ||
| all_pubsub_subs | ||
| .iter() | ||
| .filter(|pk| !never_evicted.contains(pk)) | ||
| .copied() | ||
| .collect(); | ||
| let lru_pubkeys_set: HashSet<_> = | ||
| lru_pubkeys.into_iter().collect(); | ||
|
|
||
| let extra_in_pubsub: Vec<_> = | ||
| pubsub_subs_without_never_evict | ||
| .difference(&lru_pubkeys_set) | ||
| .cloned() | ||
| .collect(); | ||
| let extra_in_lru: Vec<_> = lru_pubkeys_set | ||
| .difference(&pubsub_subs_without_never_evict) | ||
| .cloned() | ||
| .collect(); | ||
|
|
||
| if !extra_in_pubsub.is_empty() { | ||
| debug!("Extra pubkeys in pubsub client not in LRU cache: {:?}", extra_in_pubsub); | ||
| } | ||
| if !extra_in_lru.is_empty() { | ||
| debug!("Extra pubkeys in LRU cache not in pubsub client: {:?}", extra_in_lru); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| debug!("Updating active subscriptions: count={}", pubsub_total); | ||
| trace!("All subscriptions: {}", pubkeys_str(&all_pubsub_subs)); | ||
| set_monitored_accounts_count(pubsub_total); | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
LGTM with optional improvement suggestion.
The background updater provides valuable subscription-count reconciliation. The diagnostic logging (lines 228-262) is properly gated behind debug-level checks to avoid overhead in production.
Optional: Consider graceful shutdown for the metrics task.
The background task runs indefinitely and will be aborted when the provider is dropped. For a metrics task this is acceptable, but you could add a CancellationToken parameter to allow clean shutdown if needed in the future.
fn start_active_subscriptions_updater<PubsubClient: ChainPubsubClient>(
subscribed_accounts: Arc<AccountsLruCache>,
pubsub_client: Arc<PubsubClient>,
+ shutdown: tokio_util::sync::CancellationToken,
) -> task::JoinHandle<()> {
task::spawn(async move {
let mut interval = time::interval(Duration::from_millis(
ACTIVE_SUBSCRIPTIONS_UPDATE_INTERVAL_MS,
));
let never_evicted = subscribed_accounts.never_evicted_accounts();
loop {
- interval.tick().await;
+ tokio::select! {
+ _ = interval.tick() => {},
+ _ = shutdown.cancelled() => {
+ debug!("Active subscriptions updater shutting down");
+ break;
+ }
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
magicblock-chainlink/src/remote_account_provider/mod.rs around lines 200 to 270:
the background updater currently runs forever and is only aborted when the
provider is dropped; add an explicit CancellationToken parameter to
start_active_subscriptions_updater and use it inside the spawned task to stop
the loop cleanly (e.g., use tokio::select! to await either interval.tick().await
or token.cancelled().await and break/return on cancellation), propagate the
token from the caller when spawning the task, and ensure the task exits promptly
so the JoinHandle completes gracefully.
| fn verify_undelegation_in_ephem_via_owner( | ||
| pubkeys: &[Pubkey], | ||
| ctx: &IntegrationTestContext, | ||
| ) { | ||
| const RETRY_LIMIT: usize = 20; | ||
| let mut retries = 0; | ||
|
|
||
| loop { | ||
| ctx.wait_for_next_slot_ephem().unwrap(); | ||
| let mut not_verified = vec![]; | ||
| for pk in pubkeys.iter() { | ||
| let counter_pda = FlexiCounter::pda(pk).0; | ||
| let owner = ctx.fetch_ephem_account_owner(counter_pda).unwrap(); | ||
| if owner == delegation_program_id() { | ||
| not_verified.push(*pk); | ||
| } | ||
| } | ||
| if not_verified.is_empty() { | ||
| break; | ||
| } | ||
| retries += 1; | ||
| if retries >= RETRY_LIMIT { | ||
| panic!( | ||
| "Failed to verify undelegation for pubkeys: {}", | ||
| not_verified | ||
| .iter() | ||
| .map(|k| k.to_string()) | ||
| .collect::<Vec<_>>() | ||
| .join(", ") | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider adding progress logging to aid debugging.
The verification logic is correct: it polls until all accounts' owners are no longer delegation_program_id() or the retry limit is exceeded. The retry limit of 20 and the polling approach are reasonable for test scenarios.
Consider adding debug logging to track verification progress, which would help diagnose issues if the retry limit is reached:
fn verify_undelegation_in_ephem_via_owner(
pubkeys: &[Pubkey],
ctx: &IntegrationTestContext,
) {
const RETRY_LIMIT: usize = 20;
let mut retries = 0;
loop {
ctx.wait_for_next_slot_ephem().unwrap();
let mut not_verified = vec![];
for pk in pubkeys.iter() {
let counter_pda = FlexiCounter::pda(pk).0;
let owner = ctx.fetch_ephem_account_owner(counter_pda).unwrap();
if owner == delegation_program_id() {
not_verified.push(*pk);
}
}
if not_verified.is_empty() {
+ debug!("✅ Verified undelegation for {} accounts", pubkeys.len());
break;
}
retries += 1;
+ debug!(
+ "⏳ Undelegation verification retry {}/{}: {} accounts remaining",
+ retries,
+ RETRY_LIMIT,
+ not_verified.len()
+ );
if retries >= RETRY_LIMIT {
panic!(
"Failed to verify undelegation for pubkeys: {}",
not_verified
.iter()
.map(|k| k.to_string())
.collect::<Vec<_>>()
.join(", ")
);
}
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In test-integration/test-schedule-intent/tests/test_schedule_intents.rs around
lines 460 to 492, add progress logging inside the polling loop to emit the
current retry count and the list of pubkeys still not verified each iteration
and a success log when verification completes; specifically, log (e.g., via
tracing::info! or println!) the retries and not_verified pubkeys before
incrementing retries so failures are easier to diagnose, and optionally log a
final confirmation once not_verified.is_empty() before breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
magicblock-chainlink/src/chainlink/fetch_cloner.rs(5 hunks)magicblock-chainlink/src/chainlink/mod.rs(5 hunks)magicblock-metrics/src/metrics/mod.rs(9 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
📚 Learning: 2025-10-14T09:56:14.047Z
Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.
Applied to files:
magicblock-chainlink/src/chainlink/mod.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
Applied to files:
magicblock-chainlink/src/chainlink/mod.rs
📚 Learning: 2025-10-21T14:00:54.642Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Applied to files:
magicblock-chainlink/src/chainlink/mod.rsmagicblock-chainlink/src/chainlink/fetch_cloner.rsmagicblock-metrics/src/metrics/mod.rs
📚 Learning: 2025-11-07T14:20:31.457Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
Applied to files:
magicblock-chainlink/src/chainlink/mod.rsmagicblock-chainlink/src/chainlink/fetch_cloner.rs
📚 Learning: 2025-10-26T16:53:29.820Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.
Applied to files:
magicblock-chainlink/src/chainlink/fetch_cloner.rs
🧬 Code graph analysis (2)
magicblock-chainlink/src/chainlink/mod.rs (2)
magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)
blacklisted_accounts(6-30)magicblock-metrics/src/metrics/mod.rs (1)
inc_undelegation_requested(427-429)
magicblock-chainlink/src/chainlink/fetch_cloner.rs (2)
magicblock-metrics/src/metrics/mod.rs (1)
inc_undelegation_completed(431-433)magicblock-chainlink/src/remote_account_provider/config.rs (1)
try_new_with_metrics(27-42)
| if let Some(in_bank) = | ||
| self.accounts_bank.get_account(&pubkey) | ||
| { | ||
| if !in_bank.delegated() | ||
| && in_bank.owner().eq(&dlp::id()) | ||
| && !account.owner().eq(&dlp::id()) | ||
| { | ||
| debug!( | ||
| "Undelegation completed for account: {pubkey}" | ||
| ); | ||
| magicblock_metrics::metrics::inc_undelegation_completed(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undelegation completion detection logic.
During cloning we explicitly set delegated accounts to delegated = true and overwrite their owner with the delegation record’s owner (see the set_owner(...).set_delegated(is_delegated_to_us) path in this file). That means a still-delegated account lives in the bank with delegated() == true and owner() != dlp::id(). The new guard checks the exact opposite (!in_bank.delegated() and in_bank.owner() == dlp::id()), so it never trips for the happy path. As a result inc_undelegation_completed() will never run, making the new metric useless. Flip the predicate to look for an entry we previously considered delegated and an update whose owner is no longer dlp::id(), e.g.:
- if !in_bank.delegated()
- && in_bank.owner().eq(&dlp::id())
- && !account.owner().eq(&dlp::id())
- {
+ if in_bank.delegated()
+ && !account.delegated()
+ && !account.owner().eq(&dlp::id())
+ {This fires exactly once on the first post-undelegation update and keeps the metric aligned with the new observability story.
🤖 Prompt for AI Agents
In magicblock-chainlink/src/chainlink/fetch_cloner.rs around lines 224 to 235,
the undelegation completion check is inverted and never matches the actual
cloned state; change the guard to detect an entry that was previously marked
delegated in the bank (in_bank.delegated() == true and in_bank.owner() ==
dlp::id()) and which is now updated so its owner is no longer dlp::id() (i.e.
!account.owner().eq(&dlp::id())), then run the debug log and
magicblock_metrics::metrics::inc_undelegation_completed() in that branch so the
metric fires once on the first post-undelegation update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
magicblock-ledger/src/ledger_truncator.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-10-28T13:15:42.706Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.
Applied to files:
magicblock-ledger/src/ledger_truncator.rs
📚 Learning: 2025-11-07T13:20:13.793Z
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.
Applied to files:
magicblock-ledger/src/ledger_truncator.rs
🔇 Additional comments (1)
magicblock-ledger/src/ledger_truncator.rs (1)
206-207: Good use ofspawn_blockingto offload heavy deletion work.Moving RocksDB deletions to a blocking task prevents blocking the Tokio async runtime, which improves responsiveness during large truncation operations.
| if let Err(err) = delete_handle.await { | ||
| error!("Ledger delete task cancelled: {err}"); | ||
| } | ||
|
|
||
| Self::compact_slot_range(ledger, from_slot, to_slot).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider skipping compaction if the deletion task fails.
If the blocking deletion task is cancelled or panics (line 233), the function logs the error but still proceeds to compact (line 237). This means compaction runs even when slot deletions may have failed or only partially completed, which could be wasteful or lead to unexpected ledger state.
Consider returning early if delete_handle.await fails, or at minimum document why compaction should proceed despite deletion failures.
Apply this diff to skip compaction when deletion fails:
- if let Err(err) = delete_handle.await {
- error!("Ledger delete task cancelled: {err}");
+ match delete_handle.await {
+ Ok(_) => {
+ Self::compact_slot_range(ledger, from_slot, to_slot).await;
+ }
+ Err(err) => {
+ error!("Ledger delete task cancelled, skipping compaction: {err}");
+ return;
+ }
}
-
- Self::compact_slot_range(ledger, from_slot, to_slot).await;🤖 Prompt for AI Agents
In magicblock-ledger/src/ledger_truncator.rs around lines 233 to 237, the code
logs an error when delete_handle.await fails but still proceeds to call
Self::compact_slot_range(...); modify the flow to skip compaction when the
deletion task failed by returning early (or otherwise short-circuiting) after
logging the error so compaction only runs on successful deletion; ensure the
function returns or exits appropriately in the error branch and keep the
existing log message (or augment it) to preserve diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
Cargo.toml(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.
📚 Learning: 2025-10-26T16:54:39.084Z
Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: test-manual/Cargo.toml:0-0
Timestamp: 2025-10-26T16:54:39.084Z
Learning: In the magicblock-validator repository, use git branch references (not commit hashes or tags) for the helius-laserstream dependency to allow automatic updates when the branch is pushed to.
Applied to files:
Cargo.toml
🔇 Additional comments (2)
Cargo.toml (2)
94-94: Confirm the log feature removal is intentional.Removing
features = ["release_max_level_info"]means debug and trace logs will now compile into release builds. This increases binary size and runtime logging overhead in production. Verify this aligns with the PR's metrics/logging objectives and the trade-off is acceptable.
94-232: Verification complete—no issues found.The
ureq = "2.9.6"dependency is correctly added intest-integration/Cargo.toml, withtest-integration/test-tools/Cargo.tomlreferencing it as a workspace dependency. Theurl = "2.5.0"dependency is present in the root workspace Cargo.toml (line 218 in the provided snippet) and also declared in integration test modules. Both dependencies are properly positioned for integration test metrics fetching as described in the PR objectives.
This reverts commit a8900e1.
Summary by CodeRabbit
New Features
Bug Fixes
Improvements
Enhances subscription management and reliability by adding metrics, and robust reconnection logic with automatic resubscription.
Details
Subscription Management Improvements
Metrics and Monitoring
subscription_countmethod toChainPubsubClienttrait for tracking active subscriptions across all clientsRemoteAccountProviderConfigReconnection and Reliability
ReconnectableClienttraitTesting and Integration
07_subscription_limits.rsto test large-scale subscription scenarios (400 accounts)Code Quality
ReconnectableClienttrait for better abstractionureqandurlfor metrics fetching in integration tests